1 Executive Summary


  • Carbohydrates and protein ratios of the diet seem to affect BMI while snacking doesn’t seem to.
  • Higher consumption of one kind of snack predicted higher consumption of the other.
  • Weekly exercise duration had no correlation with BMI, regardless of type.


2 Full Report

2.1 Initial Data Analysis (IDA)

We collected the data by online survey on ed, DATA1001 discord and r/USYD

DATA CLEANING:
We removed the row that was impossible (7000 meals a week). We changed the values that seemed like mistakes in units (1.62 m and 1.71 m were changed into centimeters; one row with 90, 180, 90 minutes of exercise was changed into hours). We marked the empty cells with -9.

BMI was calculated as \(weight/height^2\), with units of kg/m2 (Body Mass Index (BMI) Calculator, 2021)

library(tidyverse)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.0.5
library(plotly)
# Read in your survey data
survey = read.csv("Survey - How does the type of diet and Exercise Affect BMI _.csv")
#removing the variable Timestamp
survey = select(survey, -Timestamp)
#fixing the names
my_names <- c("Gender",
              "Weight",
              "Height",
              "LowIntensity",
              "ModerateIntensity",
              "HighIntensity",
              "ProteinMeals",
              "CarbMeals",
              "VegMeals",
              "Dessert",
              "ProcessedFood")

names(survey) <- my_names
#creating the variable BMI
survey <-within(survey,BMI <- (Weight)/(Height*0.01)^2)

#looking at the outliers
#ggplot(survey, aes(y = ProteinMeals + CarbMeals + VegMeals, x = )) +
 # geom_boxplot()
#no lower outliers, two upper outliers

Structure of our Data

str(survey)
## 'data.frame':    43 obs. of  12 variables:
##  $ Gender           : chr  "Male" "Male" "Female" "Male" ...
##  $ Weight           : num  60 82 45 70 55 52 63 45 135 127 ...
##  $ Height           : num  174 172 160 162 158 ...
##  $ LowIntensity     : num  6 6 2 2 4 6 2 8 0 1 ...
##  $ ModerateIntensity: int  2 2 1 4 2 2 1 0 0 5 ...
##  $ HighIntensity    : num  6 2 0 0 0 1 0 0 0 5 ...
##  $ ProteinMeals     : int  12 19 11 10 18 4 10 5 5 12 ...
##  $ CarbMeals        : int  8 10 12 7 14 3 2 4 8 5 ...
##  $ VegMeals         : int  7 9 3 1 14 5 2 8 4 4 ...
##  $ Dessert          : int  5 3 0 3 1 1 0 15 12 2 ...
##  $ ProcessedFood    : int  10 1 4 3 1 0 1 15 12 5 ...
##  $ BMI              : num  19.8 27.7 17.6 26.7 22.2 ...

Numerical Summaries

summary(survey)
##     Gender              Weight           Height       LowIntensity   
##  Length:43          Min.   : 44.00   Min.   :150.0   Min.   : 0.000  
##  Class :character   1st Qu.: 54.00   1st Qu.:163.0   1st Qu.: 2.000  
##  Mode  :character   Median : 60.00   Median :169.0   Median : 2.000  
##                     Mean   : 65.15   Mean   :168.8   Mean   : 3.651  
##                     3rd Qu.: 70.00   3rd Qu.:175.0   3rd Qu.: 5.000  
##                     Max.   :135.00   Max.   :183.0   Max.   :24.000  
##  ModerateIntensity HighIntensity    ProteinMeals      CarbMeals     
##  Min.   :0.000     Min.   :0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:0.000     1st Qu.:0.000   1st Qu.: 6.000   1st Qu.: 6.000  
##  Median :1.000     Median :0.000   Median : 8.000   Median : 7.000  
##  Mean   :1.837     Mean   :1.198   Mean   : 9.047   Mean   : 8.465  
##  3rd Qu.:2.500     3rd Qu.:1.250   3rd Qu.:11.000   3rd Qu.:11.000  
##  Max.   :9.000     Max.   :7.000   Max.   :21.000   Max.   :21.000  
##     VegMeals         Dessert       ProcessedFood         BMI       
##  Min.   : 0.000   Min.   : 0.000   Min.   :-9.000   Min.   :16.33  
##  1st Qu.: 3.000   1st Qu.: 2.000   1st Qu.: 1.500   1st Qu.:19.08  
##  Median : 5.000   Median : 4.000   Median : 3.000   Median :21.31  
##  Mean   : 6.279   Mean   : 5.326   Mean   : 3.674   Mean   :22.78  
##  3rd Qu.: 8.500   3rd Qu.: 7.000   3rd Qu.: 5.000   3rd Qu.:23.70  
##  Max.   :21.000   Max.   :19.000   Max.   :15.000   Max.   :46.11
#cor(survey[ , -1])
#plot(survey)

We created the variables for ratio, which was Meal Count of Type/Total Meal Count for major meals.

#creating the meal type ratio variables. 
#Type of Meal Count / Total Meal Count 
ratios <- survey %>%
  mutate(CRatio = CarbMeals/(ProteinMeals+CarbMeals + VegMeals)) %>% 
  mutate(PRatio = ProteinMeals/(ProteinMeals+CarbMeals + VegMeals)) %>% 
  mutate(VRatio = VegMeals/(ProteinMeals+CarbMeals + VegMeals)) %>% 
  select(Gender, BMI, CRatio, VRatio, PRatio) 
#tidier for ggplot
longratios <-  ratios %>% pivot_longer(c("CRatio", "VRatio", "PRatio"), names_to = "RatioType", values_to = "MealRatio")
summary(longratios)
##     Gender               BMI         RatioType           MealRatio     
##  Length:129         Min.   :16.33   Length:129         Min.   :0.0000  
##  Class :character   1st Qu.:19.05   Class :character   1st Qu.:0.2353  
##  Mode  :character   Median :21.31   Mode  :character   Median :0.3333  
##                     Mean   :22.78                      Mean   :0.3333  
##                     3rd Qu.:23.73                      3rd Qu.:0.4286  
##                     Max.   :46.11                      Max.   :1.0000
str(longratios)
## tibble [129 x 4] (S3: tbl_df/tbl/data.frame)
##  $ Gender   : chr [1:129] "Male" "Male" "Male" "Male" ...
##  $ BMI      : num [1:129] 19.8 19.8 19.8 27.7 27.7 ...
##  $ RatioType: chr [1:129] "CRatio" "VRatio" "PRatio" "CRatio" ...
##  $ MealRatio: num [1:129] 0.296 0.259 0.444 0.263 0.237 ...

LIMITATIONS

Firstly, our data was biased as there were more females than males, and people possibly undercounted the amount of vegetables and carbohydrates they eat, as they are often eaten as side dishes.

#to check for bias in our sample. 
ggplot(survey ,aes(x = Gender)) + geom_bar() + ylab("")+ ggtitle("Participants Divided by Gender")

Secondly, error was introduced by the different meal sizes between individuals. Total calories consumed would have been a valuable variable but it would have been hard to measure by individuals. Also, it might be hard to classify each type of food correctly, for example, sausages.


2.2 Research Questions


2.2.1 Is there a difference in BMI due to gender

Females had a higher average BMI and also the maximum BMI. Neither graph was normal, and both were skewed left, this implies our data was too small.

ggplot(survey,aes(x=BMI,fill=Gender,))+geom_histogram(binwidth=1,color="black")+facet_grid(~Gender)+theme(axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank())

2.2.2 Is there a difference between the amounts of each type of meal consumed?

Protein based meals had the highest median consumption, while vegetable based meals had the lowest. Spreads were similar. It’s interesting that 0 is not an outlier.

NewSurvey = survey %>% select(CarbMeals, ProteinMeals, VegMeals) 
nsrvynames = c("Carbohydrate Based Meals",
               "Protein Based Meals",
               "Vegetable Based Meals")

names(NewSurvey) = nsrvynames



NewSurvey <- NewSurvey %>% pivot_longer(c("Carbohydrate Based Meals", 
                                          "Protein Based Meals", 
                                          "Vegetable Based Meals"), names_to = "MealType", values_to = "MealCount")
ya <- list(title = "", titlefont = F )
xa <- list(title = "Weekly Meal Count", titlefont = F )
#https://plotly.com/r/figure-labels/
  #(*Axes Labels in R*)

p4 = plot_ly(NewSurvey, x = ~MealCount, y = ~MealType, color = ~MealType, type = 'box') 
p4 <- p4%>% layout(showlegend = F,
                   title = "Weekly Meal Counts by Type",
                   xaxis = xa, 
                   yaxis = ya)

p4 

2.2.3 Is there a linear relationship between type of diet and BMI?


2.2.3.1 Major Meal Types’ Effects


Correlation between Meal Ratio’s and BMI and their R2 values respectively

Protein Ratios

#Protein Ratios and BMI
PrBMI = lm(BMI ~ PRatio, data = ratios)
cor(ratios$BMI, ratios$PRatio)
## [1] 0.4243385
summary(PrBMI)$r.squared
## [1] 0.1800632

Carbohydrate Ratios

CrBMI = lm(BMI ~ CRatio, data = ratios)
cor(ratios$BMI, ratios$CRatio)
## [1] -0.3358918
summary(CrBMI)$r.squared
## [1] 0.1128233

Vegetable Ratios

VrBMI = lm(BMI ~ VRatio, data = ratios)
cor(ratios$BMI, ratios$VRatio)
## [1] -0.2339149
summary(VrBMI)$r.squared
## [1] 0.05471616
ggplot(longratios, aes(x = MealRatio, y = BMI, colour = RatioType, shape = RatioType)) +
  geom_point() +
  geom_smooth(method = "lm", se = F)+
  xlab("Ratio of Each Type")  +ggtitle("Ratios of a Meal Count of a Given Type to Total Meal Count, in Relation to BMI")+
  scale_colour_discrete(name = "Meal Types",
                        breaks = c("CRatio", "PRatio", "VRatio"),
                        labels = c("Carbohydrate Based", "Protein Based", "Vegetable Based"))+
  scale_shape_discrete(name = "Meal Types",
                        breaks = c("CRatio", "PRatio", "VRatio"),
                        labels = c("Carbohydrate Based", "Protein Based", "Vegetable Based"))+
  theme(legend.position = "bottom")
## `geom_smooth()` using formula 'y ~ x'

There was a weak positive correlation between protein meal ratio and BMI (R2 = 0.1800632). There was an even weaker negative correlation between carbohydrate meal ratio and BMI (R2 = 0.1128233). There was no correlation between vegetable meal ratio and BMI (R2 = 0.0547162). The negative trend of carbohydrate is supported by the literature (Merchant et al., 2009), and it is possible the positive trend of protein is related to meat consumption.

ggplot(CrBMI) + 
  geom_point(aes(x=.fitted, y=.resid))+
  #source: https://community.rstudio.com/t/ggplot-makes-residual-plots/738  
    #(atiretoo, 2017)
  geom_abline(slope = 0, colour = "grey", alpha = 0.8)+
 # geom_smooth(aes(x=.fitted, y=.resid), se = F, colour = "Black")+
  ylab("Residuals") + xlab("Fitted Values") + ggtitle("Carbohydrate Ratio and BMI Residual Plot")

#they are random. 

ggplot(PrBMI) + 
  geom_point(aes(x=.fitted, y=.resid))+
   geom_abline(slope = 0, colour = "grey", alpha = 0.8)+
 # geom_smooth(aes(x=.fitted, y=.resid), se = F, colour = "Black")+
  ylab("Residuals") + xlab("Fitted Values") + ggtitle("Protein Ratios Residual Plot")

#they are random. 


A linear model is acceptable for carbohydrate ratio, but is not ideal for protein ratio, as the residuals have a downward trend, and are not homoscedastic.

2.2.3.2 Effects of Snack Consumption

#housekeeping for this section
snacks = survey %>% filter(ProcessedFood > 0)
newsnacks = survey %>% filter(ProcessedFood > 0) %>% pivot_longer(c("Dessert", "ProcessedFood"), names_to = "snacktype", values_to = "amount")


#looking at if the snacks consumption is internally related. 
ggplot(snacks, aes(x = ProcessedFood, y = Dessert)) + 
  geom_point() + 
  geom_smooth(method = "lm", colour = "Black", se = F)+ 
  xlab("Number of Times Proccesed Food Was Eaten Weekly") + ylab("Number of Dessert Protions Consumed Weekly") + ggtitle("Snacks in Relation to Each Other")
## `geom_smooth()` using formula 'y ~ x'

#linear model
DP = lm(Dessert ~ ProcessedFood, data = snacks)

We see that a higher consumption of one kind of snack (among desserts or processed food) somewhat correlates (R2 = 0.3469671) with a higher consumption of the other, implying the possibility of a common cause, such as habits.

#residuals
ggplot(DP) + 
  geom_point(aes(x=.fitted, y=.resid))+
   geom_abline(slope = 0, colour = "grey", alpha = 0.8)+
  ylab("Residuals") + xlab("Fitted Values") + ggtitle("Residual Plot")

#they are not random. 

The residual plot is random; thus the linear model is acceptable.

#Looking if there is a relationship between snack consumption and BMI
ggplot(newsnacks, aes(x = amount , y = BMI, colour = snacktype, shape = snacktype)) +
  geom_point() +
  xlab("Snacks Consumed (Servings/Week)" ) + ggtitle("BMI in Relation to Snack Consumption")+
  scale_colour_discrete(name = "Snack Types",
                        breaks = c("Dessert", "ProcessedFood"),
                        labels = c("Servings of dessert", "Servings of processed food"))+
   scale_shape_discrete(name = "Snack Types",
                        breaks = c("Dessert", "ProcessedFood"),
                        labels = c("Servings of dessert", "Servings of processed food"))+
    theme(legend.position = "bottom")

Correlation between weekly dessert consumption and BMI and its R2

#linear model of Bmi and desserts
DBMI = lm(BMI ~ Dessert, data = snacks)

#correlation
cor(snacks$BMI, snacks$Dessert)
## [1] -0.05235037
#r^2 
summary(DBMI)$r.squared
## [1] 0.002740561

Weekly processed food consumption and BMI’s correlation and its R2

#linear model of Bmi and processed food
PFBMI = lm(BMI ~ ProcessedFood, data = snacks)

#correlation
cor(snacks$BMI, snacks$ProcessedFood)
## [1] 0.07654133
#r^2
summary(PFBMI)$r.squared
## [1] 0.005858575

There is no significant linear correlation between either processed food (R2 = 0.0058586) or desserts (R2 = 0.0027406) and BMI. Others have found a negative relation (Ritchie, 2012).

Although more calories consumed leads to higher BMI (Togo et al., 2001), more meal count does not necessarily imply that.

2.2.4 Is there a relationship between amount of exercise done and BMI?

ggplot(survey,aes(main = "Body Mass Index by Type of Exercise", y = BMI))+
  geom_point(aes(x = LowIntensity, colour = "Low Intensity "),size = 3, shape = 20, show.legend = T) +
  geom_point(aes(x= ModerateIntensity, colour = "Moderate Intensity"),size = 3, shape = 17, show.legend = T)+
  geom_point(aes(x= HighIntensity, colour = "High Intensity"),size = 2.5, shape = 15, show.legend = T) +
  labs(title = "Body Mass Index by Type of Exercise", x = "Hours of Exercise by Type (per week)", y = "Body Mass Index", colour = "Exercise Type")

R2 Values

High Intensity

HBMI = lm(HighIntensity ~ BMI, data = survey)
summary(HBMI)$r.squared
## [1] 0.005816775

Moderate Intensity

MBMI = lm(ModerateIntensity ~ BMI, data = survey)
summary(MBMI)$r.squared
## [1] 6.855439e-06

Low Intensity

LBMI = lm(LowIntensity ~ BMI, data = survey)
summary(LBMI)$r.squared
## [1] 0.04629996

Exercise duration regardless of intensity was not correlated with BMI. Low intensity was most correlated (R2 = 0.0463), then came high intensity (R2 = 0.0058168), and finally the moderate intensity was impressively uncorrelated (R2 = 6.8554394^{-6}).


2.4 References

atiretoo. (2017, 16/09/2017). Ggplot() makes residual plots?! Retrieved 25/04/2021 from https://community.rstudio.com/t/ggplot-makes-residual-plots/738

Axes Labels in R. Plotly. Retrieved 25 April from https://plotly.com/r/figure-labels/

Body Mass Index (BMI) Calculator. (2021, 23 April). Canadian Diabetes Association. Retrieved 24 April from https://www.diabetes.ca/managing-my-diabetes/tools—resources/body-mass-index-(bmi)-calculator#:~:text=Body%20Mass%20Index%20is%20a,range%20is%2018.5%20to%2024.9.

Merchant, A. T., Vatanparast, H., Barlas, S., Dehghan, M., Shah, S. M. A., De Koning, L., & Steck, S. E. (2009). Carbohydrate intake and overweight and obesity among healthy adults. Journal of the American Dietetic Association, 109(7), 1165-1172.

Ritchie, L. D. (2012). Less frequent eating predicts greater BMI and waist circumference in female adolescents. The American journal of clinical nutrition, 95(2), 290-296.

Togo, P., Osler, M., Sørensen, T., & Heitmann, B. (2001). Food intake patterns and body mass index in observational studies. International journal of obesity, 25(12), 1741-1751.


2.5 Acknowledgments

(A):510098984 (R):500599576 (S):510221870

Every member contributed evenly in the sections 1 and 2.3. .
2.1’s dplyr section was done by A and the rest by R, but decisions and writing was done collectively.

  • A worked on sections 2.2.3 and 2.4, and helped with dplyr and general formatting.

  • R worked 2.2.4 and more on 2.1

  • S worked on 2.2.1 and 2.2.2

All 3 members came to all the meetings.

03/04/2021: The decision on the topic and the creation of the survey. Took about 2 hours. Next meeting was decided to be the week 6 lab class, to be able to have data to work with. Each person gave some ideas and then we evaluated them based on feasibility when it came to our data collection (survey).

13/04/2021: While cleaning the data, there were some illogical answers, and we needed to calculate BMI.
Had a discussion of possible research questions:

  • What’s the difference between the genders when it comes to BMI?

  • BMI-diet?

  • BMI-exercise?

  • Protein vs HighExercise (linear)

17/04/2021 We wrote the IDA and did the data cleanup, and had a discussion on the weaknesses of the study. We then divided the questions, not all questions that we had brainstormed were used.

20/04/2021 Discussion about how we were answering the questions.

24/04/2021 We got all the plots into one place, wrote 2.3., edited overall to shorten the length and for presentation.

27/04/2021 Final touches, lots of grammar edits, and video recording on zoom.